Robust Estimation of cDNA Microarray Intensities with Replicates
نویسندگان
چکیده
We consider robust estimation of gene intensities from cDNA microarray data with replicates. Several statistical methods for estimating gene intensities from microarrays have been proposed, but there has been little work on robust estimation of the intensities. This is particularly relevant for experiments with replicates, because even one outlying replicate can have a disastrous effect on the estimated intensity for the gene concerned. Because of the many steps involved in the experimental process from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a Bayesian hierarchical model for robust estimation of cDNA microarray intensities. Outliers are modeled explicitly using a t-distribution, and our model also addresses classical issues such as design effects, normalization, transformation, and nonconstant variance. Parameter estimation is carried out using Markov Chain Monte Carlo. The method is illustrated using two publicly available gene expression data sets. The between-replicate variability of the intensity estimates is reduced by 64% in one case and by 83% in the other compared to raw log ratios. The method is also compared to the ANOVA normalized log ratio, the removal of outliers based on Dixon’s test, and the lowess normalized log ratio, and the between-replicate variation is reduced by more than 55% relative to the best of these methods for both data sets. We also address the issue of whether the image background should be removed when estimating intensities. It has been argued that one should not do so because it increases variability, while the arguments for doing so are that there is a physical basis for the image background, and that not doing so will bias the estimated log-ratios of differentially expressed genes downwards. We show that the arguments on both sides of this debate are correct for our data, but that by using our model one can have the best of both worlds: one can subtract the background without greatly increasing variability.
منابع مشابه
Quality Control and Robust Estimation for cDNA Microarrays with Replicates
We consider robust estimation of gene intensities from cDNA microarray data with replicates. Several statistical methods for estimating gene intensities from microarrays have been proposed, but there has been little work on robust estimation. This is particularly relevant for experiments with replicates, because even one outlying replicate can have a disastrous effect on the estimated intensity...
متن کاملDonuts, scratches and blanks: robust model-based segmentation of microarray images
MOTIVATION Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. Th...
متن کاملBayesian Robust Inference for Differential Gene Expression in cDNA Microarrays with Multiple Samples
We consider the problem of identifying differentially expressed genes under different conditions using cDNA microarrays. Standard statistical methods cannot be used because typically there are thousands of genes and few replicates. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an ou...
متن کاملIdentification and Normalization of Plate Effects in Cdna Microarray Data
Introducing a new way of visualizing cDNA microarray data we have identified a new type of systematic variation, which we refer to as plate effects. We believe that plate effects are due to non-biological differences in the cDNA clones products spotted onto the microarray slides. By comparing the consistency of all replicates (both within and between slides) after performing 42 different normal...
متن کاملA probabilistic framework for microarray data analysis: fundamental probability models and statistical inference.
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to t...
متن کامل